Adel Abu Hashim & Mahmoud Nagy - August 2021
This case study aims to help Amber Heard
By analyzing new accounts posting/ commenting against a victim of a Social Bot Disinformation/Influence Operation.
We have three main datasets:
(The datasets screaped from reddit).
- 1- A dataset with submissions & comments data (2021).
- 2- Users Data (from 2006 to 2021).
- 3- A merged dataset (submissions & comments data, users data).
- 4- Daily creation data (# of accounts created per day from 2006 to 2021)
#import dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sb
import os
import glob
import helpers
import matplotlib.dates as mdates
import plotly.express as px
import plotly.graph_objects as go
import re
import warnings
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
warnings.filterwarnings('ignore')
from wordcloud import WordCloud
sb.set_style("darkgrid")
%matplotlib inline
[nltk_data] Downloading package stopwords to [nltk_data] C:\Users\20812015101195\AppData\Roaming\nltk_data... [nltk_data] Unzipping corpora\stopwords.zip.
# load data
df = pd.read_csv("cleaned_data/reddit_cleaned_2021.csv")
df_merged = pd.read_csv("cleaned_data/reddit_merged_2021.csv")
df_users = pd.read_csv("cleaned_data/users_cleaned.csv")
df.shape
(18305, 17)
df_merged.shape
(18305, 24)
# convert to datetime
df.created_at = pd.to_datetime(df.created_at)
df_merged.created_at = pd.to_datetime(df_merged.created_at)
df_merged.user_created_at = pd.to_datetime(df_merged.user_created_at)
df_users.user_created_at = pd.to_datetime(df_users.user_created_at)
# Filter the DataFrame on negative nltk and blob
df_negative = df.query(" sentiment_blob == sentiment_nltk == 'Negative' ")
print(df_negative.shape)
df_negative.head(1);
(2016, 17)
# Filter the Merged DataFrame on negative nltk and blob
df_merged_negative = df_merged.query(" sentiment_blob == sentiment_nltk == 'Negative' ")
print(df_merged_negative.shape)
df_merged_negative.head(1);
(2016, 24)
Reddit Comments/Submission Data¶
The number of negative parent comments on submissions¶
px.bar(data_frame=df_negative['top_level'].value_counts().to_frame().reset_index(),
x="index", y="top_level").update_layout(title='Negative Comment or Submission (Top Level / Parent)',
xaxis_title='contribution top level / parent category',
yaxis_title='number of negative contributions').update_traces(marker_color='#5296dd')
This means that we have about 850 negative first comments on submissions (not replies).
1- The # of submissions VS the # of comments¶
px.bar(data_frame=df_negative['submission_comment'].value_counts().to_frame().reset_index(),
x="index", y="submission_comment", color='submission_comment').update_layout(title='Comment or Submission (negative)',
xaxis_title='contribution category',
yaxis_title='number of negative contributions').update_traces(marker_color='#5296dd')
2- NLTK vs BLOB clssification¶
print('Unique classes of gathered models')
(df['sentiment_blob'] + ' ' + df['sentiment_nltk']).value_counts().to_frame().reset_index()
#.style.applymap(lambda x: helpers.coloring(x, {'Positive Positive': 'green', 'Negative Negative':'red',
# 'Neutral Neutral': 'orange'}), subset=['index'])
Unique classes of gathered models
| index | 0 | |
|---|---|---|
| 0 | Neutral Neutral | 4862 |
| 1 | Positive Neutral | 3318 |
| 2 | Positive Positive | 2676 |
| 3 | Negative Neutral | 2191 |
| 4 | Negative Negative | 2016 |
| 5 | Neutral Negative | 1266 |
| 6 | Neutral Positive | 906 |
| 7 | Positive Negative | 712 |
| 8 | Negative Positive | 358 |
df_sentiment = df.query('sentiment_blob == sentiment_nltk')
print(df_sentiment.shape)
df_sentiment.head()
(9554, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | t1_ghnpthf | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | 10/10 on layout and choices. Very good, op. \n... | t3_ko1duk | r/CelebBattles | Caped_Baldy91 | 2021-01-01 00:30:16 | Positive | Positive | 16.0 | submission | comment | 13 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | [] | 0 |
| 5 | t1_ghnr4bt | /r/WouldYouRather/comments/knz2cw/would_you_ra... | r/redditmoment | t3_knz2cw | r/WouldYouRather | SlavShinigamii | 2021-01-01 00:42:44 | Neutral | Neutral | 3.0 | submission | comment | 1 | would_you_rather_slap_amber_heard_until_she | 8 | [] | 0 |
| 8 | t3_ko2lp6 | /r/trendandstyle/comments/ko2lp6/the_stands_am... | The Stand's Amber Heard Sheds Light on Nadine'... | NaN | r/trendandstyle | templederr | 2021-01-01 01:39:37 | Positive | Positive | 1.0 | NaN | submission | 12 | the_stands_amber_heard_sheds_light_on_nadines | 8 | [] | 0 |
| 9 | t1_ghnymy7 | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | DC | t3_ko1duk | r/CelebBattles | masseffect2001 | 2021-01-01 01:57:50 | Neutral | Neutral | 3.0 | submission | comment | 1 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | [] | 0 |
| 10 | t3_ko33pc | /r/WouldYouRather/comments/ko33pc/wyr_savagely... | WYR savagely beat up Amber Heard or save Johnn... | NaN | r/WouldYouRather | canadianreject565 | 2021-01-01 02:12:34 | Neutral | Neutral | 4.0 | NaN | submission | 11 | wyr_savagely_beat_up_amber_heard_or_save_johnny | 9 | [] | 0 |
df_sentiment.drop(columns='sentiment_blob', inplace=True)
df_sentiment.rename(columns={'sentiment_nltk': 'sentiment'}, inplace=True)
print('Unified model data frame')
print(df_sentiment.shape)
df_sentiment.head(2) # .style.applymap(lambda x: helpers.coloring(x, {'Positive': '#deebce', 'Negative': '#edbcbb'}), subset=[ 'sentiment'])
Unified model data frame (9554, 16)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | t1_ghnpthf | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | 10/10 on layout and choices. Very good, op. \n... | t3_ko1duk | r/CelebBattles | Caped_Baldy91 | 2021-01-01 00:30:16 | Positive | 16.0 | submission | comment | 13 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | [] | 0 |
| 5 | t1_ghnr4bt | /r/WouldYouRather/comments/knz2cw/would_you_ra... | r/redditmoment | t3_knz2cw | r/WouldYouRather | SlavShinigamii | 2021-01-01 00:42:44 | Neutral | 3.0 | submission | comment | 1 | would_you_rather_slap_amber_heard_until_she | 8 | [] | 0 |
common_df = df_sentiment.sentiment.value_counts().to_frame().reset_index()
common_df.columns = ['class', 'sentiment']
common_df
| class | sentiment | |
|---|---|---|
| 0 | Neutral | 4862 |
| 1 | Positive | 2676 |
| 2 | Negative | 2016 |
fig = px.histogram(data_frame=common_df,
x='class', y='sentiment',opacity=1, title='common results between nltk and blob ').update_traces(marker_color='#5296dd',)
fig.show()
fig = go.Figure()
fig.add_trace(go.Histogram(x=df.sentiment_blob, name='BLOB'))
fig.add_trace(go.Histogram(x=df.sentiment_nltk, name= 'NLTK'))
# Overlay both histograms
# Reduce opacity to see both histograms
fig.update_traces(opacity=1)
fig.update_layout(
title_text='BLOB VS NLTK', # title of plot
xaxis_title_text='Class', # xaxis label
yaxis_title_text='Number of contributions', # yaxis label
bargap=0.2, # gap between bars of adjacent location coordinates
bargroupgap=0.1 # gap between bars of the same location coordinates
)
fig.show()
temp_seniment_df = df['sentiment_nltk'].value_counts().to_frame().reset_index()
temp_seniment_df = temp_seniment_df.merge(df['sentiment_blob'].value_counts().to_frame().reset_index())
temp_seniment_df
| index | sentiment_nltk | sentiment_blob | |
|---|---|---|---|
| 0 | Neutral | 10371 | 7034 |
| 1 | Negative | 3994 | 4565 |
| 2 | Positive | 3940 | 6706 |
explode = [0, 0, 0.1]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize = (15,8))
ax1.pie(temp_seniment_df.sentiment_nltk,
labels = temp_seniment_df.index,
autopct='%1.1f%%',
explode = explode,
textprops={'fontsize': 12},
# textprops={'color':"w"},
colors = ['#adcde1', '#bcdd93', '#ee9e9c'],
startangle=180)
ax2.pie(temp_seniment_df.sentiment_blob,
labels = temp_seniment_df.index,
colors = ['#adcde1', '#bcdd93', '#ee9e9c'],
autopct='%1.1f%%',
explode = explode,
textprops={'fontsize': 12},
startangle=180)
ax1.set_title('NLTK', fontdict = {'fontsize' : 18})
ax2.set_title('BLOB', fontdict = {'fontsize' : 18})
plt.show()
category_names = df_sentiment.index
dict_models = {
'NLTK': temp_seniment_df.sentiment_nltk,
'BLOB': temp_seniment_df.sentiment_blob
}
def models(dict_models, category_names):
labels = list(dict_models.keys())
data = np.array(list(dict_models.values()))
data_cum = data.cumsum(axis=1)
category_colors = plt.get_cmap('Paired')(
np.linspace(0.08, 0.40, data.shape[1]))
fig, ax = plt.subplots(figsize=(10, 5))
ax.invert_yaxis()
ax.xaxis.set_visible(False)
ax.set_xlim(0, np.sum(data, axis=1).max())
for i, (colname, color) in enumerate(zip(category_names, category_colors)):
widths = data[:, i]
starts = data_cum[:, i] - widths
ax.barh(labels, widths, left=starts, height=0.5,
label=colname, color=color)
xcenters = starts + widths / 2
r, g, b, _ = color
text_color = 'white' if r * g * b < 0.25 else 'black'
for y, (x, c) in enumerate(zip(xcenters, widths)):
ax.text(x, y, str(int(c)), ha='center', va='center',
color=text_color)
ax.legend(ncol=len(category_names), bbox_to_anchor=(0, 1),
loc='lower left', fontsize=15)
ax.tick_params(labelsize=14)
sb.set_context('paper', font_scale=2) # --> just scaling all of the font sizes
return fig, ax
models(dict_models, category_names)
plt.show()
we can see that negative class has near results in both models (in 2018)
3- Investigate the text column¶
# pd.set_option('display.max_colwidth', None)
suspected_dict = {}
df_fuc = df[df.text.str.lower().str.contains('fuck')]
print(df_fuc.shape)
df_fuc.head()
(1227, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 54 | t1_gho9bpc | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | Fuck a nd I thought I didn't have a life LOL | t1_gho5bap | r/CelebBattles | Beav365 | 2021-01-01 03:57:09 | Positive | Negative | -2.0 | comment | comment | 11 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | [] | 0 |
| 88 | t1_ghoms5q | /r/redditmoment/comments/ko1xfy/delete_tik_tok... | Shut the fuck up about this stealing your priv... | t1_ghoe5zj | r/redditmoment | big-shaq-skrra | 2021-01-01 06:54:36 | Negative | Negative | 56.0 | comment | comment | 28 | delete_tik_tok_or_slap_amber_heard | 7 | [] | 0 |
| 105 | t1_ghooy78 | /r/redditmoment/comments/ko1xfy/delete_tik_tok... | What the fuck | t1_ghon57f | r/redditmoment | Bombz_Armed | 2021-01-01 07:28:39 | Negative | Negative | 55.0 | comment | comment | 3 | delete_tik_tok_or_slap_amber_heard | 7 | [] | 0 |
| 110 | t1_ghopc1o | /r/Teenager/comments/knh0wt/just_cuz_i_wanna_k... | who the fuck voted ofr amber heard | t3_knh0wt | r/Teenager | matthew35433ma | 2021-01-01 07:34:53 | Negative | Negative | 1.0 | submission | comment | 7 | just_cuz_i_wanna_know_how_many_of_you_support | 10 | [] | 0 |
| 142 | t1_ghot913 | /r/redditmoment/comments/ko1xfy/delete_tik_tok... | I'd fuck her for sure | t3_ko1xfy | r/redditmoment | FatHandNoticer | 2021-01-01 08:45:21 | Positive | Neutral | -11.0 | submission | comment | 5 | delete_tik_tok_or_slap_amber_heard | 7 | [] | 0 |
# get the authors of these submissions having the same submission text
mask = (df['submission_text'] == 'fuck_amber_heard') & (df['submission_comment'] == 'submission')
df_sub = df[mask]
print(df_sub.shape)
with pd.option_context('display.max_colwidth', None):
display(df_sub.head())
(7, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 684 | t3_kpbsbr | /r/Animemes/comments/kpbsbr/fuck_amber_heard/ | Fuck amber heard | NaN | r/Animemes | dingusbob69 | 2021-01-03 02:38:42 | Negative | Negative | 269.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
| 889 | t3_kq9lal | /r/memes/comments/kq9lal/fuck_amber_heard/ | fuck Amber Heard | NaN | r/memes | guyinAmerica1 | 2021-01-04 14:18:40 | Negative | Negative | 1.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
| 1404 | t3_kriljq | /r/JusticeForJohnnyDepp/comments/kriljq/fuck_amber_heard/ | fuck amber heard | NaN | r/JusticeForJohnnyDepp | isaac0304 | 2021-01-06 06:59:05 | Negative | Negative | 33.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
| 1853 | t3_ktkm8u | /r/EntitledBitch/comments/ktkm8u/fuck_amber_heard/ | Fuck amber heard | NaN | r/EntitledBitch | big_pog_human2478 | 2021-01-09 05:17:34 | Negative | Negative | 14738.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
| 2389 | t3_kukxmy | /r/SupportAmberHeard/comments/kukxmy/fuck_amber_heard/ | FUCK AMBER HEARD | NaN | r/SupportAmberHeard | Flerp6969 | 2021-01-10 19:28:00 | Negative | Negative | 54.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
# get the authors of these submissions having the same submission text
mask = (df['submission_text'].str.contains('fuck')) & (df['submission_comment'] == 'submission')
df_sub_fuc = df[mask]
print(df_sub_fuc.shape)
with pd.option_context('display.max_colwidth', None):
display(df_sub_fuc.head())
(49, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 183 | t3_ko9ymn | /r/memes/comments/ko9ymn/fuck_amber_heard_credit_to_uwhitewolf699420_for/ | Fuck Amber Heard (credit to u/Whitewolf699420 for the template) | NaN | r/memes | -banned- | 2021-01-01 11:39:40 | Negative | Negative | 1.0 | NaN | submission | 9 | fuck_amber_heard_credit_to_uwhitewolf699420_for | 7 | [] | 0 |
| 186 | t3_ko9zmj | /r/dankmemes/comments/ko9zmj/fuck_amber_heard_credit_to_uwhitewolf699420_for/ | Fuck Amber Heard (credit to u/Whitewolf699420 for template) | NaN | r/dankmemes | -banned- | 2021-01-01 11:41:46 | Negative | Negative | 1.0 | NaN | submission | 8 | fuck_amber_heard_credit_to_uwhitewolf699420_for | 7 | [] | 0 |
| 684 | t3_kpbsbr | /r/Animemes/comments/kpbsbr/fuck_amber_heard/ | Fuck amber heard | NaN | r/Animemes | dingusbob69 | 2021-01-03 02:38:42 | Negative | Negative | 269.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
| 889 | t3_kq9lal | /r/memes/comments/kq9lal/fuck_amber_heard/ | fuck Amber Heard | NaN | r/memes | guyinAmerica1 | 2021-01-04 14:18:40 | Negative | Negative | 1.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
| 1404 | t3_kriljq | /r/JusticeForJohnnyDepp/comments/kriljq/fuck_amber_heard/ | fuck amber heard | NaN | r/JusticeForJohnnyDepp | isaac0304 | 2021-01-06 06:59:05 | Negative | Negative | 33.0 | NaN | submission | 3 | fuck_amber_heard | 3 | [] | 0 |
df_sub_fuc_contributions = df_sub_fuc.groupby(df_sub_fuc.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_sub_fuc_contributions,
x='created_at',
y='n_contributions', title='The number of submissions with the word "F*CK" in 2021')
fig.update_traces(marker_color='red', marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
df_fuc.author.value_counts().head(10)
-banned- 86 Jreal22 10 Loveseeingthatsmile 6 AutoModerator 5 VampireQueenDespair 4 zephyrBoom72 4 EdwardCircumcizehand 4 blackweebow 4 Stanley_Elkind 4 WayneTedrowJunior 4 Name: author, dtype: int64
df_fuc.submission_comment.value_counts()
comment 1150 submission 77 Name: submission_comment, dtype: int64
df_fuc.subreddit.value_counts().head(10)
r/JerkOffToCelebs 188 r/entertainment 154 r/pussypassdenied 151 r/MensRights 82 r/TrueOffMyChest 66 r/iamatotalpieceofshit 61 r/DC_Cinematic 27 r/awfuleverything 27 r/CelebAssPussyMouth 27 r/AskReddit 26 Name: subreddit, dtype: int64
df_fuc.created_at.dt.date.value_counts().head(10)
2021-04-17 73 2021-02-28 72 2021-03-04 60 2021-05-11 45 2021-02-20 40 2021-01-16 34 2021-04-18 25 2021-05-08 25 2021-04-01 24 2021-01-01 23 Name: created_at, dtype: int64
df_fuc_contributions = df_fuc.groupby(df_fuc.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_fuc_contributions,
x='created_at',
y='n_contributions', title='The number of "F*CK" contributions in 2021')
fig.update_traces(marker_color='red', marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
used the word f*ck 10 times
Negative Comments
df_jeral = df.query(" author == 'Jreal22' ")
print(df_jeral.shape)
df_jeral.head()
(28, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 13069 | t1_guu4ilo | /r/entertainment/comments/msgoz8/johnny_depp_r... | She was feeding him Xanax constantly too, you ... | t1_gutc2jp | r/entertainment | Jreal22 | 2021-04-17 11:43:17 | Negative | Neutral | 60.0 | comment | comment | 76 | johnny_depp_releases_lapd_bodycam_footage_proving | 7 | [] | 0 |
| 13073 | t1_guu4sfw | /r/entertainment/comments/msgoz8/johnny_depp_r... | Lots of people have their lines fed to them, i... | t1_gutd28y | r/entertainment | Jreal22 | 2021-04-17 11:45:13 | Negative | Negative | 6.0 | comment | comment | 49 | johnny_depp_releases_lapd_bodycam_footage_proving | 7 | [] | 0 |
| 13076 | t1_guu57ix | /r/entertainment/comments/msgoz8/johnny_depp_r... | Yeah, I couldn't believe this.\n\nThey tried t... | t1_guthn8f | r/entertainment | Jreal22 | 2021-04-17 11:48:22 | Positive | Neutral | 23.0 | comment | comment | 58 | johnny_depp_releases_lapd_bodycam_footage_proving | 7 | [] | 0 |
| 13081 | t1_guu60bw | /r/entertainment/comments/msgoz8/johnny_depp_r... | Yeah, Depp needed to stop doing the same movie... | t1_guteoj8 | r/entertainment | Jreal22 | 2021-04-17 11:54:07 | Positive | Positive | 1.0 | comment | comment | 36 | johnny_depp_releases_lapd_bodycam_footage_proving | 7 | [] | 0 |
| 13092 | t1_guu7wa8 | /r/entertainment/comments/msgoz8/johnny_depp_r... | Jay Z should have taken Chris brown out back a... | t1_gutexk2 | r/entertainment | Jreal22 | 2021-04-17 12:07:33 | Negative | Neutral | 8.0 | comment | comment | 39 | johnny_depp_releases_lapd_bodycam_footage_proving | 7 | [] | 0 |
df_jeral.text.value_counts().head(3)
Fantastic beasts is shit anyways, plus what the fuck happened to JK Rowling lol. She went on some crusade to destroy trans people, it was like uhhh, we get it you're not a fan, but ya don't have to publish articles shitting on them. 1 One thing I wish I could just get across in general to the people who just can't seem to put any blame on Amber is that there is a MASSIVE problem with women abusing men in America, that is just not talked about because women are who we hear about in the news getting beat up/killed by their male partners. We never hear about women beating men. \n\nSo there's this idea for people with no experience of it that women don't abuse men and it's just not the case. \n\nAnecdotal evidence is useless, I know, but in my life, I have witnessed more wives beating their husbands than ever men abusing their wives.\n\nAnd the guys have nowhere to go with the problem, they know that no one is going to believe that they can't control a 115 lb woman that is clawing at your face and smacking the shit out of you every night.\n\nThey don't want to talk to their friends about it, because then their friends think less of them or the friend tells their wife and the wife thinks the problem is the other way around and the woman is who is getting abused. \n\nWomen's problem with being abused is that when they're abused, they get badly hurt physically and we hear about it or we see it. \n\nBut when women abuse men, it's a shit load of psychological abuse and the physical abuse you aren't hearing about because men think it's their responsibility to "control" their wives, or they simply can't admit that their wives beat them, just like women have such a hard time telling friends/police that their boyfriends/husbands are abusing them.\n\nWomen abusing men is just not talked about in the news, and it's not shown in television and movies, so less people understand how just as terrible a problem it is. \n\nI guess because I watched my mom and friends of hers abuse their husbands so much growing up, I understand the other side of spousal abuse, and so when I see all these red flags from Amber, I recognize that the problem is clearly leaning towards her, while Depp is obviously a part of the problem as well.\n\nBut the history of women covering up black eyes with glasses and makeup is so commonly talked about and referenced in pop culture, the other side of that coin is just not talked about, and I think that's just disappointing, because I spent 20+ years of my life watching my mom abuse myself and my dad and then blame us when she had bruises on her body, or when she fell trying to hit my dad once and broke her wrist. \n\nWho did she blame when she broke her own wrist? My dad of course, who was trying to avoid being hit in the face again by her.\n\nThat's all I have to say on this stuff, I just want to get across the issue that I've seen throughout my life, and the reason I even say anything at all is because I see my own experience in Amber's abuse, and it's like she gets a free pass - doesn't lose movie roles like Depp, lies about donating the money Depp gave her to charity, and the charity even calls her out on it and still, no one cares. \n\nIt's just like, what does it take to get people to just admit that Amber is an abusive person, who has lied extensively, and has done massive damage to the metoo movement. \n\nIt makes me think, if my wife abused me, and I came out about it with literal video and audio proof of it, and still everyone blamed me and destroyed my life, how terrible would that be? That's what's happening to Depp, it's more complicated than that obviously, but I just imagine it's got to be nuts to be able to provide all this proof and yet people still give her a pass. \n\nIf they both were abusive, then punish them equally, that's all I'm asking here. \n\nWish everyone the best, I truly mean that. 1 "The actor told the High Court Ms Heard, 34, threw a vodka bottle at him which cut the top of his finger and "crushed the bones". \n\nThis was also confirmed by Amber on their recordings for their therapist that SHE was recording. \n\nAnd, I'm suddenly wrong because I've seen how a woman abuser gets away with being the main abuser in an abusive relationship? \n\nWith her having a previous relationship where her female partner admitted Amber beat her in an airport. \n\nYeah, maybe that means I know much more than you would about how men have to defend themselves from women like Amber that will literally chase you down (this was in the recorded court documents: she chased Johnny Depp into multiple rooms during a physical fight, where he and Heard both stated he ended up getting a different hotel room than her and additional security because she would not stop beating him, this was all recorded on HER tape).\n\nRead more, you'll look less like a moron defending an admitted abuser. She admitted on her own recording that she physically abused him and called him a "pussy" for running away from her during the fights.\n\nJust listen to one of Amber's recordings, it literally proves she's the abuser, without a shadow of a doubt. I'm not asking for much, listen to one damn recording. They're all on YouTube.\n\nI just can't imagine how anyone can listen to Amber sit there recording herself admitting to everything and still defend her.\n\nI don't care about Depp or Heard, but it drives me crazy that people can't just listen to Amber admit to everything that Johnny Depp has laid out.\n\nI'm not some asshole moron, I've simply seen this behavior first hand, and I've listened to all of Amber's recordings, which she recorded with Depp's permission to help in couple's therapy. 1 Name: text, dtype: int64
suspected_dict['Jreal22'] = 'Created at:2019-06-23, used the word f*ck in 28 negative comments in APR 17,18 2021'
# Check for this user account datah
df_users.query(" user_name == 'Jreal22' ")
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 49493 | Jreal22 | True | False | False | False | 25602.0 | 108.0 | 2019-06-23 14:59:56 | others | 2019 |
df_jeral_contributions = df_jeral.groupby(df_jeral.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_jeral_contributions,
x='created_at',
y='n_contributions', title='The number of "Amber Heard in Aquaman 2, why?" contributions in 2021')
fig.update_traces(marker_color='red', marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
# Check the value counts of the negative text
df_negative.text.value_counts().head(15)
Fuck Amber Heard 9 Fuck amber heard 8 Amber Heard bad 8 Amber Heard Is Allegedly Being Investigated By LAPD For Perjury, Could Face Jail Time Over Johnny Depp Domestic Violence Accusations 7 **Reminder: Do not ask for personal information, suggest someone should be doxxed, link to or comment with personal information, openly solicit personal information, or contact the people featured here. Don't even wax poetic about wanting to post identifying information. You will be banned.**\n\n**Do not encourage, glorify, or incite violence.**\n\nFor example: "Kill yourself", "It wouldn't be so bad if we killed all the pedophiles", "This guy needs to die", "I hope this guy gets stabbed to death with a rusty screwdriver", etc.\n\nAll glorification, advocacy, or suggestions of violence, EVEN IN JEST, will be permanently banned, no exceptions, and no possibility of leniency. \n\n**ALL PERSONALLY IDENTIFYING INFORMATION MUST BE CENSORED.**\n\nFailure to follow the rules of this sub will result in a permanent ban.\n \n\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/iamatotalpieceofshit) if you have any questions or concerns.* 5 Fuck Amber Heard. 5 Fuck her 5 Amber heard bad 4 fuck amber heard 4 FBI TARGETS Amber Heard; CONFIRMS Australian FEDERAL CRIMINAL CASE! 4 Wtf 4 Fuck that bitch 4 amber heard bad 3 Cara Delevingne, Sommer Ray, Scarlett Johansson, Cardi B, Kaley Cuoco, and Amber Heard. 1: Cowgirl dressed as one of their characters, 2: Pile driver anal, 3: Public Airtight Gangbang, 4: Lapdance/Titfuck, 5: Brutal throatfuck, 6: Your own fuck method. Be as detailed as possible 3 downvote this comment if the meme sucks. upvote it and I'll go away.\n\n---\n\nr/dankmemescraft Minecraft server! Out now! 3 Name: text, dtype: int64
negative_words_count = df_negative.text.str.split(expand=True).stack().value_counts().to_frame().reset_index().rename(columns={'index': 'word', 0:'count'})
negative_words_count['word'] = negative_words_count['word'].apply(lambda x: x.lower())
negative_words_count['word'] = negative_words_count['word'].str.replace('[^\w\s]','')
stop_words = set(stopwords.words('english'))
temp_dict = negative_words_count[~negative_words_count.word.isin(stop_words)].set_index('count').to_dict()['word']
temp_dict;
frequency_dict = {value:key for (key,value) in temp_dict.items()}
wc = WordCloud(max_font_size=50, max_words=100, background_color="white").generate_from_frequencies(frequency_dict)
plt.figure(figsize=(15,10))
plt.imshow(wc)
plt.axis("off")
plt.show()
5- Investigate the Negative Submissions Text¶
# this returns the number of submission_text which got the same parent(comment/submission)
df_negative.groupby('parent_id')['submission_text'].value_counts().sort_values(ascending=False)
parent_id submission_text
t3_msgoz8 johnny_depp_releases_lapd_bodycam_footage_proving 29
t3_lu5055 amber_heard_has_been_fired_from_jason_momoas 26
t3_lx2s7w its_disgusting_that_people_are_less_angry_about 26
t3_n9lwnf amber_heard_still_in_aquaman_2_despite_proof_that 25
t3_kyj460 amber_heard_is_a_monster_a_gold_digger_looking 18
..
t1_gp68n40 amber_heard_has_been_fired_from_jason_momoas 1
t1_gp652no amber_heard_has_been_fired_from_jason_momoas 1
t1_gp64ygz amber_heard_has_been_fired_from_jason_momoas 1
t1_gp63zi8 amber_heard_has_been_fired_from_jason_momoas 1
t1_gutfs2m johnny_depp_releases_lapd_bodycam_footage_proving 1
Name: submission_text, Length: 1290, dtype: int64
df_same = df_negative.groupby('parent_id')['submission_text'].value_counts().to_frame()
df_same.columns = ['same_count']
df_same = df_same.reset_index()
df_same.sort_values('same_count', ascending=False, inplace=True)
df_same = df_same[df_same['parent_id'].str.startswith('t3')]
df_same
| parent_id | submission_text | same_count | |
|---|---|---|---|
| 1191 | t3_msgoz8 | johnny_depp_releases_lapd_bodycam_footage_proving | 29 |
| 1114 | t3_lu5055 | amber_heard_has_been_fired_from_jason_momoas | 26 |
| 1119 | t3_lx2s7w | its_disgusting_that_people_are_less_angry_about | 26 |
| 1246 | t3_n9lwnf | amber_heard_still_in_aquaman_2_despite_proof_that | 25 |
| 1011 | t3_kyj460 | amber_heard_is_a_monster_a_gold_digger_looking | 18 |
| ... | ... | ... | ... |
| 1026 | t3_l1qwsh | my_top_6random_order_natalie_portman_vs_emily | 1 |
| 1027 | t3_l26bip | amber_heard_emma_stone_kristen_stewart_and_mary | 1 |
| 1028 | t3_l4tdil | amber_heard_vs_bar_refaeli | 1 |
| 1030 | t3_l6xniu | meta_remember_the_amber_heard_situation_on_this | 1 |
| 1031 | t3_l7a0hx | fuck_amber_heard_but_this_really_made_me_cringe | 1 |
371 rows × 3 columns
7- Check the number of negative text words
Of course few words are easier for bots to create
df_negative['text_words'].value_counts().head(10)
3 118 6 99 7 91 9 91 8 90 10 83 4 82 5 79 11 69 13 69 Name: text_words, dtype: int64
px.histogram(df_negative['text_words'].to_frame(), x="text_words",title='number of negative text words ',
nbins=250).update_traces(marker_color='#5296dd')
# The number of words in negative submission text
df_negative = df.query(" sentiment_blob == sentiment_nltk == 'Negative' ") #define again to add the "submission_words" column
df_negative_submission = df_negative.query(" submission_comment == 'submission' ")
df_negative_submission['submission_words'].hist(figsize=(8,8),bins=30);
plt.title('The number of words in negative submission text');
8- Most commented user¶
df.author.value_counts().to_frame().head(10).reset_index()
| index | author | |
|---|---|---|
| 0 | -banned- | 1587 |
| 1 | AutoModerator | 515 |
| 2 | CelebBattleVoteBot | 163 |
| 3 | LoveAmberHeard42286 | 124 |
| 4 | charliedba | 99 |
| 5 | Stanley_Elkind | 44 |
| 6 | Truthbetheprejudice | 43 |
| 7 | gaul66 | 37 |
| 8 | sadwook | 32 |
| 9 | Beatplayer | 32 |
fig = px.bar(df.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
height=500,
title='Most commented user in 2021').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='number of comments',
yaxis_title='user name').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
AutoModerator is a system built into reddit that allows moderators to define "rules" (consisting of checks and actions) to be automatically applied to posts in their subreddit.
df_auto_moderator = df.query(" author == 'AutoModerator' ").reset_index(drop=True)
print(df_auto_moderator.shape)
df_auto_moderator.head()
(515, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_gho6m2p | /r/AskReddit/comments/ko448x/i_strongly_enjoy_... | **PLEASE READ THIS MESSAGE IN ITS ENTIRETY BEF... | t3_ko448x | r/AskReddit | AutoModerator | 2021-01-01 03:25:15 | Positive | Positive | 1.0 | submission | comment | 155 | i_strongly_enjoy_amber_heard_as_a_person_and | 9 | ['http://www.reddit.com'] | 1 |
| 1 | t1_gho6wxk | /r/AskReddit/comments/ko45ym/imagine_a_person_... | **PLEASE READ THIS MESSAGE IN ITS ENTIRETY BEF... | t3_ko45ym | r/AskReddit | AutoModerator | 2021-01-01 03:28:45 | Positive | Positive | 1.0 | submission | comment | 155 | imagine_a_person_like_myself_that_strongly_enjoys | 8 | ['http://www.reddit.com'] | 1 |
| 2 | t1_ghogbmf | /r/memes/comments/ko5qc9/apkngxpraw_now_theres... | Hello /u/thatgamerguy567! Unfortunately, your ... | t3_ko5qc9 | r/memes | AutoModerator | 2021-01-01 05:25:21 | Positive | Neutral | 1.0 | submission | comment | 239 | apkngxpraw_now_theres_the_meeting_code_do_what | 8 | [] | 0 |
| 3 | t1_ghoh7cn | /r/memes/comments/ko5wmd/apkngxpraw_now_theres... | Rule 9 overused. No Johny Depp or Amber Heard ... | t3_ko5wmd | r/memes | AutoModerator | 2021-01-01 05:36:52 | Positive | Neutral | 1.0 | submission | comment | 37 | apkngxpraw_now_theres_the_meeting_code_do_what | 8 | [] | 0 |
| 4 | t1_ghoh8an | /r/memes/comments/ko5ws6/apkngxpraw_now_theres... | Rule 9 overused. No Johny Depp or Amber Heard ... | t3_ko5ws6 | r/memes | AutoModerator | 2021-01-01 05:37:14 | Positive | Neutral | 1.0 | submission | comment | 37 | apkngxpraw_now_theres_the_meeting_code_do_what | 8 | [] | 0 |
df_auto_moderator.subreddit.value_counts().head(10)
r/JerkOffToCelebs 176 r/memes 55 r/Celebhub 48 r/unpopularopinion 32 r/AskReddit 30 r/DC_Cinematic 26 r/CelebBattleLeague 22 r/OutOfTheLoop 18 r/iamatotalpieceofshit 17 r/jerkbudss 6 Name: subreddit, dtype: int64
df_auto_moderator['permalink'].iloc[26]
'/r/darkjokes/comments/kq9r9d/if_relationship_of_amber_heard_and_johnny_depp/gi2itdv/'
df_auto_moderator.text.value_counts().head()
### [Browse JerkOffChallenges](https://jerkofftocelebs.com/actors/) • [Browse Picture Galleries](https://jerkofftocelebs.com/pictures/) • NEW [JerkOffToGermanCelebs](https://reddit.com/r/JerkOffToGermanCelebs/)\n\n\n^(*Thank you for your submission. Make sure to follow the rules.*) \n\n^(*Check out our Website*) ^[*here*](https://jerkofftocelebs.com/). \n\n^(*Join our Discord*) ^[*here*](https://discord.gg/FMhrH2j).\n\n^(*Explore more subreddits*) ^[*here*](https://jerkofftocelebs.com/reddit-nsfw-list/).\n\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/JerkOffToCelebs) if you have any questions or concerns.* 161 Rule 9 overused. No Johny Depp or Amber Heard memes at this time\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/memes) if you have any questions or concerns.* 52 Make sure to follow our community guidelines.\n\n**[Get user flair](https://redd.it/e0a4mn) and check out our [Instagram page](https://www.instagram.com/celebtag) for more celebrity content!**\n\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/Celebhub) if you have any questions or concerns.* 42 Your submission has been automatically removed for not including a valid category/subcategory tag. Tags are essential to an optimal browsing experience for our users.\n\nSince your post was removed automatically, you are free to resubmit it with an appropriate tag. You can find the tagging guide [here](/r/DC_Cinematic/wiki/linkflair#wiki_automated_tagging). Add a valid and appropriate tag in your submission title. Choose wisely, as posts with misleading tags are subject to removal.\n\n**Message the moderators if your post was removed despite being tagged with an input from the category list.**\n\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/DC_Cinematic) if you have any questions or concerns.* 26 Your opinion about celebrity relationships has been reposted dozens of times today. Please join a celebrity gossip sub. Thanks.\n\n*I am a bot, and this action was performed automatically. Please [contact the moderators of this subreddit](/message/compose/?to=/r/unpopularopinion) if you have any questions or concerns.* 24 Name: text, dtype: int64
This is a vote bot
df_vote_bot = df.query(" author == 'CelebBattleVoteBot' ").reset_index(drop=True)
print(df_vote_bot.shape)
df_vote_bot.head()
(163, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t3_ko1ew0 | /r/CelebbattlePolls/comments/ko1ew0/marvel_vs_... | Marvel vs DC : Team Marvel (Evangeline Lilly B... | NaN | r/CelebbattlePolls | CelebBattleVoteBot | 2021-01-01 00:23:29 | Neutral | Positive | 1.0 | NaN | submission | 25 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | [] | 0 |
| 1 | t1_ghnp3w4 | /r/CelebbattlePolls/comments/ko1ew0/marvel_vs_... | Poll for [Marvel vs DC : Team Marvel (Evangeli... | t3_ko1ew0 | r/CelebbattlePolls | CelebBattleVoteBot | 2021-01-01 00:23:30 | Neutral | Positive | 1.0 | submission | comment | 29 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | ['https://reddit.com'] | 1 |
| 2 | t1_ghnp3wq | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | Vote here: https://www.reddit.com/poll/ko1ew0\... | t3_ko1duk | r/CelebBattles | CelebBattleVoteBot | 2021-01-01 00:23:30 | Positive | Neutral | 1.0 | submission | comment | 12 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | ['https://www.reddit.com'] | 1 |
| 3 | t3_kp2soe | /r/CelebbattlePolls/comments/kp2soe/rachel_mca... | Rachel McAdams vs Amber Heard | NaN | r/CelebbattlePolls | CelebBattleVoteBot | 2021-01-02 18:35:07 | Neutral | Neutral | 2.0 | NaN | submission | 5 | rachel_mcadams_vs_amber_heard | 5 | [] | 0 |
| 4 | t1_ghul208 | /r/CelebbattlePolls/comments/kp2soe/rachel_mca... | Poll for [Rachel McAdams vs Amber Heard](https... | t3_kp2soe | r/CelebbattlePolls | CelebBattleVoteBot | 2021-01-02 18:35:07 | Neutral | Neutral | 1.0 | submission | comment | 9 | rachel_mcadams_vs_amber_heard | 5 | ['https://reddit.com'] | 1 |
df_vote_bot['permalink'].iloc[0]
'/r/CelebbattlePolls/comments/ko1ew0/marvel_vs_dc_team_marvel_evangeline_lilly_brie/'
df_vote_bot.subreddit.value_counts().head(10)
r/CelebbattlePolls 83 r/CelebBattles 79 r/JerkOffToCelebs 1 Name: subreddit, dtype: int64
df_vote_bot.text.value_counts().head()
Amanda Seyfried VS. Amber Heard 2 Anne Hathaway vs Amber Heard 2 Hottest League : Kristen Bell VS Amber Heard 1 Poll for [Amber Heard vs Torrie Wilson](https://reddit.com/lp7af8) on CelebBattles 1 Vote here: https://new.reddit.com/r/CelebbattlePolls/comments/mm3k4p\n\n---\n\n^^I'm ^^a ^^bot. ^^This ^^action ^^was ^^performed ^^automatically. 1 Name: text, dtype: int64
Positive Submissions
df_charliedba = df.query(" author == 'charliedba' ").reset_index(drop=True)
print(df_charliedba.shape)
df_charliedba.head()
(99, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t3_ms8xkq | /r/Amber_Heard/comments/ms8xkq/long_legged_bea... | Long legged beauty [Amber Heard] | NaN | r/Amber_Heard | charliedba | 2021-04-16 18:07:26 | Negative | Positive | 1.0 | NaN | submission | 5 | Amber_Heard | 2 | [] | 0 |
| 1 | t3_msbo0x | /r/BeautifulFemales/comments/msbo0x/amber_hear... | Amber Heard [irtr] | NaN | r/BeautifulFemales | charliedba | 2021-04-16 20:23:46 | Neutral | Neutral | 29.0 | NaN | submission | 3 | amber_heard_irtr | 3 | [] | 0 |
| 2 | t3_mtkdvb | /r/Amber_Heard/comments/mtkdvb/angelic_smile_a... | Angelic smile [Amber Heard] | NaN | r/Amber_Heard | charliedba | 2021-04-18 20:08:57 | Positive | Positive | 1.0 | NaN | submission | 4 | Amber_Heard | 2 | [] | 0 |
| 3 | t3_mtkfg1 | /r/UHQcelebs/comments/mtkfg1/amber_heard_2100_... | Amber Heard [2100 x 3150] | NaN | r/UHQcelebs | charliedba | 2021-04-18 20:11:13 | Neutral | Neutral | 55.0 | NaN | submission | 5 | amber_heard_2100_x_3150 | 5 | [] | 0 |
| 4 | t3_mtkgku | /r/FamousFaces/comments/mtkgku/amber_heard_210... | Amber Heard [2100 x 3150] | NaN | r/FamousFaces | charliedba | 2021-04-18 20:12:54 | Neutral | Neutral | 5.0 | NaN | submission | 5 | amber_heard_2100_x_3150 | 5 | [] | 0 |
df_charliedba['permalink'].iloc[0]
'/r/Amber_Heard/comments/ms8xkq/long_legged_beauty_amber_heard/'
df_charliedba.subreddit.value_counts().head(10)
r/Amber_Heard_2 80 r/Amber_Heard 7 r/HighResCelebs 3 r/UHQcelebs 3 r/BeautifulFemales 2 r/PrettyWomen 1 r/FamousFaces 1 r/Celebs 1 r/Celebhub 1 Name: subreddit, dtype: int64
df_charliedba.text.value_counts().head(10)
Amber Heard 31 Stunning Amber Heard 4 Gorgeous Amber Heard 3 Angelic Amber Heard 3 Cute Amber Heard 3 Amber Heard [2100 x 3150] 3 Young Amber Heard 2 Amber Heard [High Resolution] 2 Amber Heard [2550 x 3645] 2 Amber Heard [2218 x 3000] 2 Name: text, dtype: int64
posting negative comments (related to sex).
# check the date this account was creted
# user_name --> index
# at[index_value , 'column']
# df_users.set_index('user_name').at['Stanley_Elkind', 'user_created_at']
df_users[df_users.user_name == 'Stanley_Elkind']
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 65169 | Stanley_Elkind | True | False | False | False | 22104.0 | 30001.0 | 2021-01-25 02:04:24 | others | 2021 |
df_stanley = df.query(" author == 'Stanley_Elkind' ").reset_index(drop=True)
print(df_stanley.shape)
df_stanley.head()
(44, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_gmzc8ly | /r/JerkOffToCelebs/comments/lhu3nv/amber_heard... | I would put it in every one of her holes | t3_lhu3nv | r/JerkOffToCelebs | Stanley_Elkind | 2021-02-11 20:42:48 | Neutral | Neutral | 3.0 | submission | comment | 10 | amber_heard | 2 | [] | 0 |
| 1 | t1_gnf1pv3 | /r/DC_Cinematic/comments/ljqbut/discussion_hon... | Everyone is cool burgeoning cult leader Jared ... | t3_ljqbut | r/DC_Cinematic | Stanley_Elkind | 2021-02-14 15:28:05 | Positive | Positive | 1.0 | submission | comment | 9 | DC_Cinematic | 2 | [] | 0 |
| 2 | t1_gni2nrr | /r/JerkOffToCelebs/comments/lk5bc8/cant_help_b... | Black widow | t3_lk5bc8 | r/JerkOffToCelebs | Stanley_Elkind | 2021-02-15 04:21:15 | Negative | Neutral | 1.0 | submission | comment | 2 | cant_help_but_notice_the_way_she_looks_at_you_now | 11 | [] | 0 |
| 3 | t1_gnuualx | /r/JerkOffToCelebs/comments/lmc28b/amber_heard... | The more people hate her the more I want to fu... | t3_lmc28b | r/JerkOffToCelebs | Stanley_Elkind | 2021-02-18 05:36:20 | Negative | Neutral | 3.0 | submission | comment | 12 | amber_heard_could_do_with_a_skull_fucking | 8 | [] | 0 |
| 4 | t1_go1jx5r | /r/JerkOffToCelebs/comments/lnodlf/amber_heard... | I can’t be quit her. I don’t want to. | t3_lnodlf | r/JerkOffToCelebs | Stanley_Elkind | 2021-02-19 20:01:40 | Neutral | Positive | 1.0 | submission | comment | 9 | amber_heard_is_the_sexiest_psycho_i_ever_saw | 9 | [] | 0 |
df_stanley.subreddit.value_counts().head(10)
r/JerkOffToCelebs 36 r/pickoneceleb 3 r/CelebAssPussyMouth 2 r/CelebWouldYouRather 2 r/DC_Cinematic 1 Name: subreddit, dtype: int64
df_stanley.text.value_counts().head(10)
Breed repeatedly on my own: Gal\n\nBreed once: Melissa\n\nGang bang Margot\n\nAnd this was obviously just an excuse to hear other people say theyd watch Amber Heard get fucked by a horse 1 She’s going to need it. We won’t be gentle. 1 Oral slaves for a month: Margot and Elizabeth\n\nOne time BDSM session: Scarlett (dom) and Amber (sub)\n\nLesbian sex while I watch and jerk off: Gal and Brie 1 When it comes to Amber Heard I let my penis make the decisions. 1 Rough? DC\n\nPassionate, deep penetrating sex is for Marvel 1 Her ass is so nice 🤤 1 She knows the way to a man’s soul 1 Everyone is cool burgeoning cult leader Jared Leto though. 1 u r goin to jail 4 lyfe 1 Pick one to wake up next to, fool around with a bit and make love to well into the afternoon and another to show up to your house at night to be your utterly compliant sex toy until dawn - Elizabeth Olsen, Emilia Clarke, Ana de Armas, Jennifer Lawrence, Amber Heard, Margot Robbie 1 Name: text, dtype: int64
df_stanley_contributions = df_stanley.groupby(df_stanley.created_at.dt.date).size().reset_index(name='n_contributions')
fig = px.bar(df_stanley_contributions,
x='created_at',
y='n_contributions', title='The number of "Truthbetheprejudice" contributions in 2021')
fig.update_traces(marker_color='red', marker_line_width=2, opacity=1, textposition='auto')
# , marker_line_color='#5296dd'
fig.show()
Negative Submissions
# check the date this account was creted
df_users[df_users.user_name == 'Truthbetheprejudice']
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 70571 | Truthbetheprejudice | True | True | True | True | NaN | NaN | NaT | banned | banned |
df_truth = df.query(" author == 'Truthbetheprejudice' ").reset_index(drop=True)
print(df_truth.shape)
df_truth.head()
(43, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t3_mtioam | /r/MensRights/comments/mtioam/johnny_depp_seen... | Johnny Depp Seen In Rare Photos In Spain, As H... | NaN | r/MensRights | Truthbetheprejudice | 2021-04-18 18:41:42 | Positive | Neutral | 1.0 | NaN | submission | 17 | johnny_depp_seen_in_rare_photos_in_spain_as_his | 10 | [] | 0 |
| 1 | t3_mtjtsr | /r/movies/comments/mtjtsr/johnny_depp_makes_an... | Johnny Depp Makes an Appearance in Spain as La... | NaN | r/movies | Truthbetheprejudice | 2021-04-18 19:40:25 | Positive | Negative | 1.0 | NaN | submission | 16 | johnny_depp_makes_an_appearance_in_spain_as | 8 | [] | 0 |
| 2 | t3_mtkogv | /r/entertainment/comments/mtkogv/johnny_depp_m... | Johnny Depp Makes an Appearance in Spain as La... | NaN | r/entertainment | Truthbetheprejudice | 2021-04-18 20:24:14 | Positive | Negative | 1.0 | NaN | submission | 16 | johnny_depp_makes_an_appearance_in_spain_as | 8 | [] | 0 |
| 3 | t3_muyp3k | /r/MensRights/comments/muyp3k/justiceforjohnny... | #JusticeForJohnnyDepp Johnny Depp fans donate ... | NaN | r/MensRights | Truthbetheprejudice | 2021-04-20 19:47:45 | Neutral | Negative | 75.0 | NaN | submission | 12 | justiceforjohnnydepp_johnny_depp_fans_donate_40k | 6 | [] | 0 |
| 4 | t3_muypet | /r/entertainment/comments/muypet/justiceforjoh... | #JusticeForJohnnyDepp Johnny Depp fans donate ... | NaN | r/entertainment | Truthbetheprejudice | 2021-04-20 19:48:06 | Neutral | Negative | 1.0 | NaN | submission | 12 | justiceforjohnnydepp_johnny_depp_fans_donate_40k | 6 | [] | 0 |
df_truth.subreddit.value_counts().head(10)
r/MensRights 10 r/movies 10 r/entertainment 10 r/pussypassdenied 8 r/JusticeForJohnnyDepp 5 Name: subreddit, dtype: int64
df_truth.text.value_counts().head(10)
Amber Heard Is Allegedly Being Investigated By LAPD For Perjury, Could Face Jail Time Over Johnny Depp Domestic Violence Accusations 5 Amber Heard Under Investigation for Perjury in Johnny Depp Domestic Violence Case 5 Amber Heard probed for perjury and staging 2016 Johnny Depp domestic violence case: 'Lock her up' 5 Amber Heard to struggle in new LAPD domestic violence probe 5 Johnny Depp sues ACLU on to see if ex-wife Amber Heard gave $7m divorce settlement 5 #JusticeForJohnnyDepp Johnny Depp fans donate $40k to children that Amber Heard neglected 4 Remove Amber Heard from Aquaman 2 4 Confirmed: Amber Heard Under Criminal Investigation in Australia 4 Big tech silence Johnny Depp and censor Amber Heard evidence 3 Johnny Depp Makes an Appearance in Spain as Lawyers Drop New Evidence in Amber Heard Case 2 Name: text, dtype: int64
df_truth_contributions = df_truth.groupby(df_truth.created_at.dt.date).size().reset_index(name='n_contributions')
df_truth_contributions
| created_at | n_contributions | |
|---|---|---|
| 0 | 2021-04-18 | 3 |
| 1 | 2021-04-20 | 4 |
| 2 | 2021-04-29 | 4 |
| 3 | 2021-04-30 | 4 |
| 4 | 2021-05-07 | 8 |
| 5 | 2021-05-08 | 5 |
| 6 | 2021-05-09 | 5 |
| 7 | 2021-05-10 | 5 |
| 8 | 2021-05-19 | 5 |
fig = px.bar(df_truth_contributions,
x='created_at',
y='n_contributions', title='The number of "Truthbetheprejudice" contributions in 2021')
fig.update_traces(marker_color='red',
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
voting in a positive way
df_gaul = df.query(" author == 'gaul66' ").reset_index(drop=True)
print(df_gaul.shape)
df_gaul.head()
(37, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_ghobzuy | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | Marvel | t3_ko1duk | r/CelebBattles | gaul66 | 2021-01-01 04:29:09 | Neutral | Positive | 8.0 | submission | comment | 1 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | [] | 0 |
| 1 | t1_ghuwup4 | /r/CelebBattles/comments/kp2qmx/rachel_mcadams... | Rachel | t3_kp2qmx | r/CelebBattles | gaul66 | 2021-01-02 20:16:31 | Neutral | Neutral | 2.0 | submission | comment | 1 | rachel_mcadams_vs_amber_heard | 5 | [] | 0 |
| 2 | t1_giqmms5 | /r/CelebBattles/comments/ku6peu/hotter_one_gal... | **Gal** | t3_ku6peu | r/CelebBattles | gaul66 | 2021-01-10 06:10:19 | Neutral | Neutral | 5.0 | submission | comment | 1 | hotter_one_gal_gadot_vs_amber_heard_vs_alison_... | 10 | [] | 0 |
| 3 | t1_gk1f28y | /r/CelebBattles/comments/l1qwsh/my_top_6random... | Natalie | t3_l1qwsh | r/CelebBattles | gaul66 | 2021-01-21 07:10:16 | Neutral | Neutral | 1.0 | submission | comment | 1 | my_top_6random_order_natalie_portman_vs_emily | 8 | [] | 0 |
| 4 | t1_gkf1t2n | /r/CelebBattles/comments/l3eo1l/katy_perry_vs_... | **Evangeline** | t3_l3eo1l | r/CelebBattles | gaul66 | 2021-01-23 18:03:01 | Neutral | Neutral | 2.0 | submission | comment | 1 | katy_perry_vs_evangeline_lilly_vs_amber_heard | 8 | [] | 0 |
df_gaul.subreddit.value_counts().head(10)
r/CelebBattles 37 Name: subreddit, dtype: int64
df_gaul['permalink'].iloc[5]
'/r/CelebBattles/comments/l4h1wp/amber_heard_vs_brie_larson/gkpl5eg/'
Negative Comments
# check the date this account was creted
df_users[df_users.user_name == 'sadwook']
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 43292 | sadwook | True | False | False | False | 131.0 | 21.0 | 2018-12-29 07:46:33 | others | 2018 |
df_sadwook = df.query(" author == 'sadwook' ").reset_index(drop=True)
print(df_sadwook.shape)
df_sadwook.head()
(32, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_gxpnlsu | /r/pussypassdenied/comments/n95rfx/amber_heard... | Good... \n\n"" He said: “The severity of these... | t3_n95rfx | r/pussypassdenied | sadwook | 2021-05-11 10:09:06 | Positive | Negative | 1.0 | submission | comment | 101 | amber_heard_to_struggle_in_new_lapd_domestic | 8 | [] | 0 |
| 1 | t1_gxpnp4c | /r/pussypassdenied/comments/n95rfx/amber_heard... | i will not hold back. JUSTICE FOR JOHNNY | t3_n95rfx | r/pussypassdenied | sadwook | 2021-05-11 10:10:29 | Neutral | Negative | 2.0 | submission | comment | 8 | amber_heard_to_struggle_in_new_lapd_domestic | 8 | [] | 0 |
| 2 | t1_gxpntzg | /r/pussypassdenied/comments/n95rfx/amber_heard... | we wont stay silent. this mans reputation is g... | t1_gxnoecm | r/pussypassdenied | sadwook | 2021-05-11 10:12:32 | Positive | Neutral | 1.0 | comment | comment | 34 | amber_heard_to_struggle_in_new_lapd_domestic | 8 | [] | 0 |
| 3 | t1_gxpnxf1 | /r/pussypassdenied/comments/n95rfx/amber_heard... | pretty much the summary of her career. im not ... | t1_gxoh4o0 | r/pussypassdenied | sadwook | 2021-05-11 10:13:55 | Negative | Neutral | 3.0 | comment | comment | 24 | amber_heard_to_struggle_in_new_lapd_domestic | 8 | [] | 0 |
| 4 | t1_gxpnzbf | /r/pussypassdenied/comments/n95rfx/amber_heard... | bruh | t1_gxn7p1k | r/pussypassdenied | sadwook | 2021-05-11 10:14:43 | Neutral | Neutral | 0.0 | comment | comment | 1 | amber_heard_to_struggle_in_new_lapd_domestic | 8 | [] | 0 |
df_sadwook.subreddit.value_counts().head(10)
r/pussypassdenied 32 Name: subreddit, dtype: int64
df_sadwook_contributions = df_sadwook.groupby(df_sadwook.created_at.dt.date).size().reset_index(name='n_contributions')
df_sadwook_contributions
| created_at | n_contributions | |
|---|---|---|
| 0 | 2021-05-11 | 32 |
fig = px.bar(df_sadwook_contributions,
x='created_at',
y='n_contributions', title='The number of "sadwook" contributions in 2021')
fig.update_traces(marker_color='red',
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
It's weird!!
all this user contributions in 2021 (32) are in the same day 2021-05-11
user created: 2018-12-29 the same date of 2018 peak!!
df_sadwook_hrs = df_sadwook.groupby(df_sadwook.created_at.dt.hour).size().reset_index(name='n_contributions')
df_sadwook_hrs
| created_at | n_contributions | |
|---|---|---|
| 0 | 10 | 19 |
| 1 | 11 | 13 |
fig = px.bar(df_sadwook_hrs,
x='created_at',
y='n_contributions', title='The number of "sadwook" contributions in 2021')
fig.update_traces(marker_color='red',
marker_line_width=2, opacity=1, textposition='auto')
fig.show()
positive Comments
df_beatplayer = df.query(" author == 'Beatplayer' ").reset_index(drop=True)
print(df_beatplayer.shape)
df_beatplayer.head()
(32, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | t1_gpl1ocz | /r/TrueOffMyChest/comments/lx2s7w/its_disgusti... | Wait wait. Has something changed here. Because... | t3_lx2s7w | r/TrueOffMyChest | Beatplayer | 2021-03-03 22:00:23 | Negative | Neutral | 36.0 | submission | comment | 36 | its_disgusting_that_people_are_less_angry_about | 8 | [] | 0 |
| 1 | t1_gpl5mw9 | /r/TrueOffMyChest/comments/lx2s7w/its_disgusti... | But then again, that was just a statement made... | t1_gpl5997 | r/TrueOffMyChest | Beatplayer | 2021-03-03 22:30:38 | Positive | Neutral | -1.0 | comment | comment | 86 | its_disgusting_that_people_are_less_angry_about | 8 | [] | 0 |
| 2 | t1_gpl71q8 | /r/TrueOffMyChest/comments/lx2s7w/its_disgusti... | Like most abusive relationships tbh, from the ... | t1_gpl6uww | r/TrueOffMyChest | Beatplayer | 2021-03-03 22:41:44 | Positive | Neutral | -2.0 | comment | comment | 26 | its_disgusting_that_people_are_less_angry_about | 8 | [] | 0 |
| 3 | t1_gpla7k5 | /r/TrueOffMyChest/comments/lx2s7w/its_disgusti... | K. So I’m 12.5 mins in, and absolutely shatter... | t1_gpl8d57 | r/TrueOffMyChest | Beatplayer | 2021-03-03 23:07:01 | Negative | Negative | 20.0 | comment | comment | 109 | its_disgusting_that_people_are_less_angry_about | 8 | [] | 0 |
| 4 | t1_gplbi77 | /r/TrueOffMyChest/comments/lx2s7w/its_disgusti... | I’m not sure that the case to be honest. \n\nI... | t1_gpl9u5x | r/TrueOffMyChest | Beatplayer | 2021-03-03 23:17:45 | Negative | Neutral | 15.0 | comment | comment | 149 | its_disgusting_that_people_are_less_angry_about | 8 | [] | 0 |
df_beatplayer['permalink'].iloc[0]
'/r/TrueOffMyChest/comments/lx2s7w/its_disgusting_that_people_are_less_angry_about/gpl1ocz/'
df_beatplayer.subreddit.value_counts().head(10)
r/TrueOffMyChest 32 Name: subreddit, dtype: int64
df_beatplayer.text.value_counts().head(3)
I’m sure that I’d rake your subjective interpretation of the body language of a DV violence victim in court of video evidence, and I think that’s the crux of the issue with women being beloved tbh. 1 Wait wait. Has something changed here. Because she won two cases in jurisdictions that are notoriously difficult to have female evidence believed. \n\nShe had videos.\n\nGenuine ask - am I missing this evidence of AH lying? 1 I mean - neither of those situations are rape. They’re absolutely gross, and criminal, but they are absolutely not rape. \n\nCheck it out and get back to me with a retraction. 1 Name: text, dtype: int64
9- Invesigating authors with the most negative comments / submissions¶
df_negative.author.value_counts()
-banned- 130
AutoModerator 9
LoveAmberHeard42286 9
Stanley_Elkind 9
KeepingDankMemesDank 8
...
officialnast 1
I_bless_you 1
AnnaBortion269 1
TacticalSnacks 1
IamHere019 1
Name: author, Length: 1620, dtype: int64
fig = px.bar(df_negative.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
height=500,
title='Authors with most Negative Comments / Submissions').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='number of comments / submissions',
yaxis_title='user name').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
a lot of negative-comments authors have been banned from Reddit which is great!
# suspected3 = df_negative.author.value_counts().head(10).index.to_list()
# set(suspected3) & set(suspected_list)
AutoModerator is a system built into reddit that allows moderators to define "rules" (consisting of checks and actions) to be automatically applied to posts in their subreddit.
# Check for the dates these accounts where created
# suspected_list = suspected_list + suspected3
# # Furthe investigate the text of the most negative commented users
# df_text5 = df[df.author.isin(suspected3)]
# df_text5 = df_text5.sort_values('created_at')
# print(df_text5.shape)
# df_text5.head(60)
# df_text5.text.value_counts().head(10)
10- Invesigating authors with the most negative submissions¶
df_negative_submission = df_negative[df_negative.submission_comment == 'submission']
print(df_negative_submission.shape)
df_negative_submission.head(2)
(167, 17)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 183 | t3_ko9ymn | /r/memes/comments/ko9ymn/fuck_amber_heard_cred... | Fuck Amber Heard (credit to u/Whitewolf699420 ... | NaN | r/memes | -banned- | 2021-01-01 11:39:40 | Negative | Negative | 1.0 | NaN | submission | 9 | fuck_amber_heard_credit_to_uwhitewolf699420_for | 7 | [] | 0 |
| 186 | t3_ko9zmj | /r/dankmemes/comments/ko9zmj/fuck_amber_heard_... | Fuck Amber Heard (credit to u/Whitewolf699420 ... | NaN | r/dankmemes | -banned- | 2021-01-01 11:41:46 | Negative | Negative | 1.0 | NaN | submission | 8 | fuck_amber_heard_credit_to_uwhitewolf699420_for | 7 | [] | 0 |
df_negative_submission.author.value_counts().nlargest(n=10)
-banned- 60 Truthbetheprejudice 5 bradje61 3 Sleeper1034 3 krttz35 3 DerpimusNoobimus 2 Memestrats4life 2 shalanarose 2 MikiSayaka33 2 spiritual_one1 2 Name: author, dtype: int64
df_negative_submission.author.value_counts().to_frame().head(10)
| author | |
|---|---|
| -banned- | 60 |
| Truthbetheprejudice | 5 |
| bradje61 | 3 |
| Sleeper1034 | 3 |
| krttz35 | 3 |
| DerpimusNoobimus | 2 |
| Memestrats4life | 2 |
| shalanarose | 2 |
| MikiSayaka33 | 2 |
| spiritual_one1 | 2 |
fig = px.bar(df_negative_submission.author.value_counts().to_frame().head(10).reset_index(), x="author", y="index",
height=500,
title='Authors with most Negative Submissions').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='Number of Negative Submissions',
yaxis_title='Author_Name').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
# suspected4 = df_negative_submission.author.value_counts().nlargest(n=10).index.to_list()
# set(suspected4) & set(suspected_list)
# suspected_list = suspected_list + suspected4
# # Furthe investigate the text of the most negative submitted users
# df_text6 = df[df.author.isin(suspected4)]
# df_text6 = df_text6.sort_values('created_at')
# print(df_text6.shape)
# df_text6.head()
# df_text6.text.value_counts().head(10)
11- Check wether the users contributing the most to negative comments/submissions are mod, gold or having a verified email¶
df_negative.author.value_counts().nlargest(n=25)
-banned- 130 AutoModerator 9 LoveAmberHeard42286 9 Stanley_Elkind 9 KeepingDankMemesDank 8 my_alt_account1312 6 DrewFlan 6 Beatplayer 5 Truthbetheprejudice 5 AltruisticVariation4 5 The_Scamp 5 durant92bhd 4 modsRwads 4 Juju_mila 4 Zom-bom 4 amphibiousParakeet 4 flat_earth_pancakes 4 Anonymous2401 4 nojodricri 4 Furkan38000 4 WayneTedrowJunior 4 WhosDadIsThat 3 reptilicious1 3 Reddit_Shadow_ 3 DerpimusNoobimus 3 Name: author, dtype: int64
check_list = df_negative.author.value_counts().nlargest(n=25).index.tolist()[1:]
check_list
['AutoModerator', 'LoveAmberHeard42286', 'Stanley_Elkind', 'KeepingDankMemesDank', 'my_alt_account1312', 'DrewFlan', 'Beatplayer', 'Truthbetheprejudice', 'AltruisticVariation4', 'The_Scamp', 'durant92bhd', 'modsRwads', 'Juju_mila', 'Zom-bom', 'amphibiousParakeet', 'flat_earth_pancakes', 'Anonymous2401', 'nojodricri', 'Furkan38000', 'WayneTedrowJunior', 'WhosDadIsThat', 'reptilicious1', 'Reddit_Shadow_', 'DerpimusNoobimus']
# get a data frame with the most negative-comments users
df_check = df_users[df_users['user_name'].isin(check_list)]
print(df_check.shape)
df_check.head(2)
(24, 10)
| user_name | has_verified_email | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | |
|---|---|---|---|---|---|---|---|---|---|---|
| 4432 | AutoModerator | True | True | True | False | 1000.0 | 1000.0 | 2012-01-05 05:24:28 | others | others |
| 6953 | DrewFlan | True | False | False | False | 325044.0 | 5513.0 | 2012-08-27 19:39:06 | others | others |
df_check['user_name'].nunique()
24
for col in df_check.columns:
if col not in ['user_name', 'user_created_at']:
print('The value counts of the users contributing the most to negative comments/submissions: ' + col)
print(df_check[col].value_counts())
print('\n')
The value counts of the users contributing the most to negative comments/submissions: has_verified_email True 21 False 3 Name: has_verified_email, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: is_mod False 16 True 8 Name: is_mod, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: is_gold False 18 True 6 Name: is_gold, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: is_banned False 20 True 4 Name: is_banned, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: comment_karma 14105.0 1 146.0 1 1000.0 1 12123.0 1 90873.0 1 49687.0 1 12964.0 1 13977.0 1 188.0 1 4174.0 1 15574.0 1 1916.0 1 110672.0 1 18790.0 1 325044.0 1 22104.0 1 5880.0 1 1206.0 1 30348.0 1 402308.0 1 Name: comment_karma, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: link_karma 3388.0 1 1372.0 1 1000.0 1 15274.0 1 1127.0 1 5513.0 1 14009.0 1 10143.0 1 2680.0 1 9269.0 1 345.0 1 1185.0 1 3104.0 1 2985.0 1 51.0 1 10931.0 1 30001.0 1 20.0 1 1788.0 1 1.0 1 Name: link_karma, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: banned_unverified others 17 banned 4 unverified 3 Name: banned_unverified, dtype: int64 The value counts of the users contributing the most to negative comments/submissions: creation_year 2020 7 others 6 banned 4 2021 3 2019 2 2018 2 Name: creation_year, dtype: int64
13- Invesigating subreddits with the most negative comments¶
df_negative.subreddit.value_counts()
r/entertainment 287
r/pussypassdenied 204
r/JerkOffToCelebs 173
r/MensRights 134
r/iamatotalpieceofshit 114
...
r/HolUp 1
r/Celebswithbigtits 1
r/agedlikewine 1
r/TIHI 1
r/celebheels 1
Name: subreddit, Length: 176, dtype: int64
df_negative.subreddit.value_counts().nlargest(n=25)
r/entertainment 287 r/pussypassdenied 204 r/JerkOffToCelebs 173 r/MensRights 134 r/iamatotalpieceofshit 114 r/AskReddit 87 r/TrueOffMyChest 86 r/DC_Cinematic 74 r/EntitledBitch 45 r/movies 40 r/redditmoment 38 r/awfuleverything 33 r/PurplePillDebate 32 r/PrequelMemes 30 r/celebnsfw 29 r/CelebBattles 29 r/dankmemes 20 r/JusticeForJohnnyDepp 20 r/memes 18 r/TheStand 17 r/SubredditDrama 17 r/Makeup 15 r/funny 14 r/dontputyourdickinthat 14 r/gameofthrones 13 Name: subreddit, dtype: int64
fig = px.bar(df_negative.subreddit.value_counts().to_frame().head(25).reset_index(), x="subreddit", y="index",
height=500,
title='Subreddits with most Negative Comments / Submissions').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='Number of Negative Comments',
yaxis_title='subbredit').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
Subreddits with the most negative comments are almost the same as the most used subreddits
14- Submission URLS¶
df['urls'].nunique()
233
df[df.astype(str)['urls'] != '[]'].head(2)
| child_id | permalink | text | parent_id | subreddit | author | created_at | sentiment_blob | sentiment_nltk | score | top_level | submission_comment | text_words | submission_text | submission_words | urls | urls_count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | t1_ghnp3w4 | /r/CelebbattlePolls/comments/ko1ew0/marvel_vs_... | Poll for [Marvel vs DC : Team Marvel (Evangeli... | t3_ko1ew0 | r/CelebbattlePolls | CelebBattleVoteBot | 2021-01-01 00:23:30 | Neutral | Positive | 1.0 | submission | comment | 29 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | ['https://reddit.com'] | 1 |
| 3 | t1_ghnp3wq | /r/CelebBattles/comments/ko1duk/marvel_vs_dc_t... | Vote here: https://www.reddit.com/poll/ko1ew0\... | t3_ko1duk | r/CelebBattles | CelebBattleVoteBot | 2021-01-01 00:23:30 | Positive | Neutral | 1.0 | submission | comment | 12 | marvel_vs_dc_team_marvel_evangeline_lilly_brie | 8 | ['https://www.reddit.com'] | 1 |
df['urls'].astype('str').value_counts().head()
[] 17522 ['https://jerkofftocelebs.com', 'https://jerkofftocelebs.com', 'https://reddit.com', 'https://jerkofftocelebs.com', 'https://discord.gg', 'https://jerkofftocelebs.com'] 161 ['https://www.reddit.com'] 49 ['https://reddit.com'] 43 ['https://redd.it', 'https://www.instagram.com'] 42 Name: urls, dtype: int64
# the value counts of the # of urls
df['urls_count'].value_counts()
0 17522 1 394 6 170 2 130 3 41 5 19 4 14 8 5 9 3 10 3 16 2 20 1 60 1 Name: urls_count, dtype: int64
df['urls_count'].value_counts()[1:].plot(kind='bar', figsize=(8,8));
plt.title('Count of the number of URLS in each submission/comment');
16- Invesigating submissions with the most negative comments¶
This here only shows the submissions with the most top_level comments
df_negative.parent_id.value_counts().nlargest(n=25)
t3_msgoz8 29 t3_lu5055 26 t3_lx2s7w 26 t3_n9lwnf 25 t3_kyj460 18 t3_lgbi6k 15 t3_mgjuv3 15 t3_n77qe8 12 t3_lo748q 12 t3_mssbu1 10 t3_l6ler4 10 t3_n7ug34 10 t3_ko1xfy 9 t3_l8g47j 8 t3_ktkm8u 8 t3_n884fx 8 t3_kqbj53 8 t3_mygtjf 7 t3_kr5da2 7 t3_kxwb32 7 t3_msz0b7 6 t3_ncf0uq 6 t3_l898tv 6 t3_lo6f92 6 t3_n95rfx 6 Name: parent_id, dtype: int64
fig = px.bar(df_negative.parent_id.value_counts().to_frame().head(25).reset_index(), x="parent_id", y="index",
height=500,
title='sumbissions with most Negative comments').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='Number of Negative Comments',
yaxis_title='subbredit').update_traces(marker_color='#5296dd')
fig.update_yaxes(autorange="reversed")
Difference in time between creating the account and posting (negative)¶
# note that value_counts() neglect Zeros
df_merged_negative["days_after_creation"].value_counts()
632.0 8
358.0 7
67.0 7
150.0 6
3355.0 5
..
1355.0 1
2531.0 1
763.0 1
1096.0 1
1211.0 1
Name: days_after_creation, Length: 1225, dtype: int64
px.histogram(df_merged_negative, x="days_after_creation",title='days_after_creation',
nbins=250).update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='number of days',)
Posting Duration After Account Creation¶
print('The number of accounts posted the same day they was created!')
df_merged_negative[df_merged_negative['days_after_creation'] == 0].shape[0]
The number of accounts posted the same day they was created!
3
print('The number of accounts posted the same week they was created!')
df_merged_negative[df_merged_negative['days_after_creation'] <= 7].shape[0]
The number of accounts posted the same week they was created!
21
print('The number of accounts posted the same month they was created!')
df_merged_negative[df_merged_negative['days_after_creation'] <= 30].shape[0]
The number of accounts posted the same month they was created!
50
df_merged_negative[df_merged_negative['days_after_creation'] <= 30]['user_created_at'].dt.year.value_counts()
2021 35 2020 15 Name: user_created_at, dtype: int64
we can find that 323 accounts created and posted within the same month in 2021.
mask = (df_merged_negative['days_after_creation'] <= 30) & (df_merged_negative['user_created_at'].dt.year == 2021)
df_merged_negative[mask]['user_created_at'].dt.strftime('%b').value_counts()
Jan 11 Apr 9 Feb 7 Mar 7 May 1 Name: user_created_at, dtype: int64
months = df_merged_negative[df_merged_negative['days_after_creation'] <= 30]['user_created_at'].dt.strftime('%b')
months_sorted = months.value_counts()[['Jan', 'Feb', 'Mar', 'Apr', 'May']]
months_sorted
Jan 11 Feb 7 Mar 7 Apr 9 May 1 Name: user_created_at, dtype: int64
months_sorted.plot(kind='bar', figsize=(8,8), title='contributions of the accounts negatively posted/commented \n the same month they were created');
# THE SAME MONTH:
# check for the date these accounts posted/commented
reddit_30 = df_merged_negative[df_merged_negative['days_after_creation'] <= 30]
dates_count = df_merged_negative.groupby(reddit_30['created_at'].dt.date).size().reset_index(name='contributions')
dates_count.sort_values('contributions', ascending=False);
fig = px.bar(dates_count,
x='created_at',
y='contributions', title = 'contributions of the accounts posted/commented the same month they were created')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
# THE SAME WEEK
# check for the date these accounts posted/commented
reddit_7 = df_merged_negative[df_merged_negative['days_after_creation'] <= 7]
dates_count_7 = df_merged_negative.groupby(reddit_7['created_at'].dt.date).size().reset_index(name='contributions')
dates_count_7.sort_values('contributions', ascending=False);
fig = px.bar(dates_count_7,
x='created_at',
y='contributions', title = 'contributions of the accounts posted/commented the same week they were created')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
# THE SAME DAY
# check for the date these accounts posted/commented
reddit_1 = df_merged_negative[df_merged_negative['days_after_creation'] <= 0]
dates_count_1 = df_merged_negative.groupby(reddit_1['created_at'].dt.date).size().reset_index(name='contributions')
dates_count_1.sort_values('contributions', ascending=False);
fig = px.bar(dates_count_1,
x='created_at',
y='contributions', title = 'contributions of the accounts posted/commented the same day they were created')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
# get the author names that commented in a negative way the same month the account was created
# to add to the suspected list
df_merged_30 = df_merged_negative.query("days_after_creation <= 30 & sentiment_blob == sentiment_nltk == 'Negative' ")
df_merged_30.head()
| child_id | permalink | text | parent_id | subreddit | created_at | sentiment_blob | sentiment_nltk | score | top_level | ... | is_mod | is_gold | is_banned | comment_karma | link_karma | user_created_at | banned_unverified | creation_year | diff | days_after_creation | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2784 | t1_ghra7an | /r/Makeup/comments/kogfek/amber_heard_the_face... | Um... wtf. A UK court found that Johnny did ph... | t3_kogfek | r/Makeup | 2021-01-01 20:39:19 | Negative | Negative | 113.0 | submission | ... | False | False | False | 13294.0 | 3484.0 | 2020-12-25 03:08:50 | unverified | 2020 | 7 days 17:30:29 | 7.0 |
| 2785 | t1_ghrcksp | /r/Makeup/comments/kogfek/amber_heard_the_face... | I figure it’s just Pirates of the Caribbean st... | t1_ghrbajl | r/Makeup | 2021-01-01 21:00:15 | Negative | Negative | 48.0 | comment | ... | False | False | False | 13294.0 | 3484.0 | 2020-12-25 03:08:50 | unverified | 2020 | 7 days 17:51:25 | 7.0 |
| 2904 | t1_ghse9qw | /r/Vent/comments/jpgsyv/i_fucking_hate_amber_h... | JOHNNY DEPP IS A SCUMBAG DRUG ADDICT PIECE OF ... | t3_jpgsyv | r/Vent | 2021-01-02 02:43:56 | Negative | Negative | 1.0 | submission | ... | False | False | False | 188.0 | 20.0 | 2020-12-28 04:17:41 | others | 2020 | 4 days 22:26:15 | 4.0 |
| 2905 | t1_ghsebqn | /r/Vent/comments/jpgsyv/i_fucking_hate_amber_h... | GO FUCK YOURSELF | t1_gbf65ke | r/Vent | 2021-01-02 02:44:27 | Negative | Negative | 0.0 | comment | ... | False | False | False | 188.0 | 20.0 | 2020-12-28 04:17:41 | others | 2020 | 4 days 22:26:46 | 4.0 |
| 2906 | t1_ghsen7o | /r/Vent/comments/jpgsyv/i_fucking_hate_amber_h... | FUCK YOU AND FUCK JOHNNY DEPP | t1_gbhcttr | r/Vent | 2021-01-02 02:47:19 | Negative | Negative | -1.0 | comment | ... | False | False | False | 188.0 | 20.0 | 2020-12-28 04:17:41 | others | 2020 | 4 days 22:29:38 | 4.0 |
5 rows × 24 columns
# suspected5 = (df_merged_30.user_name).tolist()
# set(suspected5) & set(suspected_list)
# suspected_list = suspected_list + suspected5
# with pd.option_context('display.max_colwidth', None):
# display(df_merged.query("user_name == 'LoveAmberHeard42286' ").text.head())
looks like this account is not suspected, so we are going to remove from the suspected_list
# suspected_list.remove('LoveAmberHeard42286')
# len(suspected_list)
Estimation of Number of User Accounts Created in each year / having negative contributions in 2021¶
# the negative contributions of user accounts created in each year
negative_contributions = df_merged_negative.groupby(df_merged_negative['user_created_at'].dt.year).size().reset_index(name='n_accounts')
negative_contributions
| user_created_at | n_accounts | |
|---|---|---|
| 0 | 2006.0 | 5 |
| 1 | 2007.0 | 3 |
| 2 | 2008.0 | 8 |
| 3 | 2009.0 | 7 |
| 4 | 2010.0 | 16 |
| 5 | 2011.0 | 164 |
| 6 | 2012.0 | 73 |
| 7 | 2013.0 | 65 |
| 8 | 2014.0 | 69 |
| 9 | 2015.0 | 82 |
| 10 | 2016.0 | 127 |
| 11 | 2017.0 | 133 |
| 12 | 2018.0 | 212 |
| 13 | 2019.0 | 317 |
| 14 | 2020.0 | 446 |
| 15 | 2021.0 | 129 |
# , text='n_accounts' --> this displays the Y-Axis values on the bars
fig = px.bar(negative_contributions,
x='user_created_at', y='n_accounts', text='n_accounts', title='Number of User Accounts Created in each year / having negative contributions in 2021')
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto')
fig.show()
Which dates has the highest negative contrbitions for users?¶
# grop by date and count
trendy_dates_negative = df_negative.groupby(df_negative['created_at'].dt.date).agg('count')['created_at'].to_frame()
# naming the count column as contribution_count
trendy_dates_negative.columns = ['contribution_count']
trendy_dates_negative.sort_values('contribution_count', ascending=False, inplace=True)
trendy_dates_negative = trendy_dates_negative.reset_index()
fig = px.bar(trendy_dates_negative,
x='created_at', y='contribution_count')
fig.update_layout(
title={
'text': "The number of negative contributions created in each date",
'x':0.5,
'xanchor': 'center',
'yanchor': 'top'
})
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
NOTE: a lot of negative contributions on 17-4-2021!
# get the top 10 trendy dates first, then sort them by date
top_trendy_negative = trendy_dates_negative.head(10)
top_trendy_negative.sort_values('created_at', inplace=True)
top_trendy_negative = top_trendy_negative.head(10).reset_index()
fig = px.bar(top_trendy_negative, x="created_at", y="contribution_count",
height=500,
title='Highest negative contrbutions in 2021').update_traces(marker_color='#5296dd',).update_layout(
xaxis_title='Date',
yaxis_title='Number of negative contrbutions').update_traces(marker_color='#5296dd')
fig.update_layout(
xaxis = dict(
title='Contribution Date',
tickmode = 'array',
tickvals = top_trendy_negative.created_at,
)
)
# check for the accounts contributed on the peak day (Apr 17,2021)
df_merged_negative[df_merged_negative.created_at.dt.strftime('%Y-%m-%d') == '2021-04-17']\
.user_created_at.dt.year.value_counts().sort_index().plot(kind='bar', figsize=(13,8));
# plt.gca().invert_xaxis()
plt.title('The creation year of the accounts contributed on the peak day (Apr 17,2021)');
plt.xlabel('Accout Creation Year');
plt.ylabel('n_contributions');
# check for the accounts contributed on the peak day (Feb 20,2021)
df_merged_negative[df_merged_negative.created_at.dt.strftime('%Y-%m-%d') == '2021-02-20']\
.user_created_at.dt.year.value_counts().sort_index().plot(kind='bar', figsize=(13,8));
# plt.gca().invert_xaxis()
plt.title('The creation year of the accounts contributed on the peak day (Feb 20,2021)');
plt.xlabel('Accout Creation Year');
plt.ylabel('n_contributions');
# check for the accounts contributed on the peak day (Feb 28,2021)
df_merged_negative[df_merged_negative.created_at.dt.strftime('%Y-%m-%d') == '2021-02-28']\
.user_created_at.dt.year.value_counts().sort_index().plot(kind='bar', figsize=(13,8));
# plt.gca().invert_xaxis()
plt.title('The creation year of the accounts contributed on the peak day (Feb 28,2021)');
plt.xlabel('Accout Creation Year');
plt.ylabel('n_contributions');
# check for the hour these negative contributions were made
df_hours_negative = df_negative.groupby(df_negative['created_at'].dt.hour).size().reset_index(name='contribution_count')
df_hours_negative.sort_values('contribution_count', ascending=False);
fig = px.bar(df_hours_negative,
x='created_at', y='contribution_count',
title='Number of contrbutions Comment/Submission in day hours')
fig.update_layout(
xaxis = dict(
title='Hours of Day',
tickmode = 'linear',
dtick = 1
)
)
fig.update_traces(marker_color='#5296dd',
marker_line_width=1.5, opacity=1, textposition='auto').update_layout()
fig.show()
In Which days accounts created more negative comments
week_day_negative = df_negative['created_at'].dt.strftime('%a')
# one can sort by any order by providing a custom index explicitely :
# https://stackoverflow.com/questions/43855474/changing-sort-in-value-counts/43855492
week_day_negative_sorted = week_day_negative.value_counts()[['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']]
week_day_negative_sorted
Mon 252 Tue 298 Wed 190 Thu 232 Fri 206 Sat 516 Sun 322 Name: created_at, dtype: int64
week_day_negative_sorted.plot(kind='bar', color='#5296dd', figsize = (8,8), title='In Which days users created more negative?');
# plt.gca().invert_xaxis()
# # ! --> used to run terminal/ command line expression in jupyter notebook
# !jupyter nbconvert New_2021_Analysis.ipynb --TagRemovePreprocessor.remove_input_tags='{"remove_cell"}' --to slides -- no-input
# # downgraded nbconvert to 5.6 for this code to be able to use the output_toggle (hide code cells)
# !jupyter nbconvert New_2021_Analysis.ipynb --to slides --post serve --template output_toggle